Fast Stochastic Variance Reduced Gradient Method with Momentum Acceleration for Machine Learning
نویسندگان
چکیده
Recently, research on accelerated stochastic gradient descentmethods (e.g., SVRG) has made exciting progress (e.g., lin-ear convergence for strongly convex problems). However,the best-known methods (e.g., Katyusha) requires at leasttwo auxiliary variables and two momentum parameters. Inthis paper, we propose a fast stochastic variance reductiongradient (FSVRG) method, in which we design a novel up-date rule with the Nesterov’s momentum and incorporatethe technique of growing epoch size. FSVRG has only oneauxiliary variable and one momentum weight, and thus itis much simpler and has much lower per-iteration complex-ity. We prove that FSVRG achieves linear convergence forstrongly convex problems and the optimal O(1/T ) conver-gence rate for non-strongly convex problems, where T is thenumber of outer-iterations. We also extend FSVRG to di-rectly solve the problems with non-smooth component func-tions, such as SVM. Finally, we empirically study the per-formance of FSVRG for solving various machine learningproblems such as logistic regression, ridge regression, Lassoand SVM. Our results show that FSVRG outperforms thestate-of-the-art stochastic methods, including Katyusha. KeywordsStochastic optimization, variance reduction, momentum ac-celeration, non-strongly convex, non-smooth
منابع مشابه
Stochastic Proximal Gradient Descent with Acceleration Techniques
Proximal gradient descent (PGD) and stochastic proximal gradient descent (SPGD) are popular methods for solving regularized risk minimization problems in machine learning and statistics. In this paper, we propose and analyze an accelerated variant of these methods in the mini-batch setting. This method incorporates two acceleration techniques: one is Nesterov’s acceleration method, and the othe...
متن کاملFinite Sum Acceleration vs. Adaptive Learning Rates for the Training of Kernel Machines on a Budget
Training predictive models with stochastic gradient descent is widespread practice in machine learning. Recent advances improve on the basic technique in two ways: adaptive learning rates are widely used for deep learning, while acceleration techniques like stochastic average and variance reduced gradient descent can achieve a linear convergence rate. We investigate the utility of both types of...
متن کاملStochastic Variance-Reduced ADMM
The alternating direction method of multipliers (ADMM) is a powerful optimization solver in machine learning. Recently, stochastic ADMM has been integrated with variance reduction methods for stochastic gradient, leading to SAGADMM and SDCA-ADMM that have fast convergence rates and low iteration complexities. However, their space requirements can still be high. In this paper, we propose an inte...
متن کاملFast-and-Light Stochastic ADMM
The alternating direction method of multipliers (ADMM) is a powerful optimization solver in machine learning. Recently, stochastic ADMM has been integrated with variance reduction methods for stochastic gradient, leading to SAG-ADMM and SDCA-ADMM that have fast convergence rates and low iteration complexities. However, their space requirements can still be high. In this paper, we propose an int...
متن کاملFast Asynchronous Parallel Stochastic Gradient Decent
Stochastic gradient descent (SGD) and its variants have become more and more popular in machine learning due to their efficiency and effectiveness. To handle large-scale problems, researchers have recently proposed several parallel SGD methods for multicore systems. However, existing parallel SGD methods cannot achieve satisfactory performance in real applications. In this paper, we propose a f...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1703.07948 شماره
صفحات -
تاریخ انتشار 2017